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Abstract 

The recently developed selected columns of the density matrix (SCDM) method 
[J. Chem. Theory Comput. 11, 1463, 2015] is a simple, robust, efficient and 
highly parallelizable method for constructing localized orbitals from a set of 
delocalized Kohn-Sham orbitals for insulators and semiconductors ■with T point 
sampling of the Brillouin zone. In this work we generalize the SCDM method 
to Kohn-Sham density functional theory calculations with k-point sampling 
of the Brillouin zone, which is needed for more general electronic structure 
calculations for solids. We demonstrate that our new method, called SCDM-k, 
is by construction gauge independent and a natural way to describe localized 
orbitals. SCDM-k computes localized orbitals without the use of an optimization 
procedure, and thus does not suffer from the possibility of being trapped in a 
local minimum. Furthermore, the computational complexity of using SCDM-k 
to construct orthogonal and localized orbitals scales as 0{N log N) where N is 
the total number of k-points in the Brillouin zone. SCDM-k is therefore efficient 
even when a large number of k-points are used for Brillouin zone sampling. We 
demonstrate the numerical performance of SCDM-k using systems with model 
potentials in two and three dimensions. 

Keywords: Kohn-Sham density functional theory. Localized orbitals, 

Brillouin zone sampling. Density matrix, Interpolative decomposition 


1. Introduction 

Kohn-Sham density functional theory (DFT) [Tl|2] is the most widely used 
electronic structure theory for molecules and systems in condensed phase. The 
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Kohn-Sham orbitals (a.k.a. Kohn-Sham wavefunctions) are eigenfunctions of 
the Kohn-Sham Hamiltonian. We refer to the span of a given set of Kohn-Sham 
orbitals as the Kohn-Sham invariant subspace. These orbitals are in general 
delocalized, i.e. each orbital has significant magnitude across the entire com¬ 
putational domain. However, information about atomic structure and chemical 
bonding, which is often localized in real space, may be difficult to interpret 
from delocalized Kohn-Sham orbitals. The connection between localized and 
delocalized information is made possible by a localization procedure. 

A localization procedure finds a set of orbitals that are localized in real space, 
and span the Kohn-Sham invariant subspace. Examples of widely used local¬ 
ization schemes include Boys localization [3] mostly in the context of chemistry, 
and maximally localized Wannier functions (MLWFs) [HIS] mostly in the con¬ 
text of physics and materials science. The localized orbitals are not only useful 
for analyzing chemical and materials systems, but can also serve as powerful 
computational tools for hybrid functional calculations li[Z]> theory of polariza¬ 
tion of crystalline solids based on Berry-phase calculations [5|, interpolation of 
band structure |1|, linear scaling DFT calculations |5], and excited state theo¬ 
ries nniiii] among others. Because of the wide range of applications for localized 
orbitals, several other localization methods have also been proposed in the past 
few years [T3HT3] . 

The potential for constructing localized orbitals from delocalized Kohn-Sham 
orbitals can be justified physically by the “nearsightedness” principle for elec¬ 
tronic matter of finite HOMO-LUMO gap [TTl 113] . The nearsightedness prin¬ 
ciple can be more rigorously stated as the single particle density matrix (DM) 
being exponentially localized along the off-diagonal direction in its real space 
representation na m I1SH131- Based on the exponential decay of the DM in 
the real space, we have recently developed the selected columns of the den¬ 
sity matrix (SCDM) method [TB] as a new way to construct localized orbitals. 
The method is simple, robust, efficient and highly parallelizable. As the name 
suggests, the localized orbitals are obtained directly from a column selection 
procedure implicitly applied to the density matrix. Hence, the locality of these 
columns is a direct consequence of the locality of the density matrix. In contrast 
with Boys localized orbitals or MLWFs, our method does not attempt to mini¬ 
mize a given localization measure via a minimization procedure. Consequently, 
our method does not require any initial guess of localized orbitals, and its cost 
is predetermined for a given problem size. It also avoids some of the potential 
problems associated with a minimization scheme, such as getting stuck at a 
local minimum. 

For isolated molecules, the number of electrons is relatively small. On the 
other hand, the number of electrons in solids can reach macroscopic scale, and 
the calculation must be simplified. Using the fact that the potential and the 
electron density are periodic with respect to the unit cell of a solid system, 
one can perform a Bloch decomposition of the Kohn-Sham Hamiltonian. The 
wavefunctions for each Bloch decomposed Hamiltonian satisfy twisted boundary 
conditions indexed by a vector k belonging to the so-called Brillouin zone. In 
order to compute physical quantities such as the electron density and total 
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energy in Kohn-Sham DFT, the Brillouin zone needs to be represented by a 
number of discrete k-points. This is called Brillouin zone sampling. We refer 
readers to section 3 as well as [21] for more details of the Bloch decomposition 
and Brillouin zone sampling. In particular, the scheme using one special k 
point, denoted by T = (0,0,0)^, to sample the Brillouin zone is called the T 
point sampling scheme. The SCDM procedure proposed in Ref. m is applicable 
to Kohn-Sham DFT calculations for isolated molecules, and for solids with T 
point sampling of the Brillouin zone. 

In many physics and materials science applications such as chemical bond¬ 
ing analysis of complex solids, band structure interpolation, and Berry-phase 
theories, localized orbitals need to be constructed from Kohn-Sham orbitals ob¬ 
tained from a set of k-points in the Brillouin zone other than the F point. The 
number of k points needed is system dependent, and can range from tens to 
tens of thousands. The common practice for Brillouin zone sampling is to di¬ 
agonalize the Kohn-Sham Hamiltonian matrix for each k-point independently. 
Since the Kohn-Sham Hamiltonian matrix is in general complex Hermitian, the 
Kohn-Sham orbitals obtained for each k-point can acquire an arbitrary phase, 
often referred to as the “gauge” of the orbitals. For degenerate orbitals (i.e. 
orbitals with the same eigenvalue) the gauge can be an arbitrary unitary ma¬ 
trix. The widely used method for finding MLWFs |S] is gauge-dependent. It 
involves the differentiation operator with respect to the Brillouin zone index 
k. Therefore a gauge transformation needs to be performed prior to the min¬ 
imization procedure to smooth the gauge, so that the differentiation operator 
is well defined |4]. Such a gauge smoothing procedure is not unique. After the 
gauge transformation, the computation of MLWFs requires the minimization of 
a nonlinear, non-convex energy functional. Therefore, the iterative procedure 
may get stuck at local minimum. Furthermore, the nonlinear energy functional 
and its solution can depend heavily on the initial guess. This is especially the 
case for materials where within the unit cell there is complex atomic structure. 

In this paper, we generalize the SCDM procedure for finding localized or¬ 
bitals to solids with k-point sampling. The new method, which we refer to as 
SCDM-k, has a few notable features. First, the localized orbitals are obtained 
directly from columns of the density matrix, which is a gauge invariant quantity. 
Thus, SCDM-k does not require a gauge transformation, and the result is inde¬ 
pendent of the choice of the gauge. Second, SCDM-k is a direct method that 
does not involve an iterative optimization procedure and thus avoids getting 
stuck at a local minimum. Third, SCDM-k has only one parameter to adjust 
(size of the local supercell), which is introduced to improve the efficiency, and 
our numerical experiments indicate that the quality of the localized orbitals 
is relatively insensitive to the choice of this parameter. Finally, the SCDM-k 
procedure is highly efficient. The complexity for generating non-orthogonal and 
orthogonal localized orbitals is 0{N) and ©(iVloglV), respectively where N is 
the total number of k-points in the Brillouin zone. Therefore the method is suit¬ 
able even when a large number of k-points are used for sampling the Brillouin 
zone. 

The paper is organized as follows. Section outlines the notation and some 
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concepts that will be used throughout this paper. Section then outlines the 
procedure for solving Kohn-Sham DFT with Brillouin zone sampling. After 
briefly reviewing the SCDM procedure for the F point case, we describe in sec¬ 
tion]^ the SCDM-k procedure for k-point sampling. Finally, section presents 
numerical results in two and three dimensions using a model potential and is 
followed by concluding remarks and future directions in section 

2. Preliminaries 
2.1. Notation 

A relatively self-contained discussion of k-point sampling requires the intro¬ 
duction of a considerable amount of notation. Tablesummarizes the requisite 
notation that we will be using throughout this manuscript. We also provide a 
brief overview of some of the more pervasive notation used throughout the rest 
of the text, and introduce the remainder as needed. In the discussion below, 
without loss of generality we assume the dimension d = 3, and the formulation 
can be easily extended to d = 1 or d = 2 . 

We denote a unit cell by f2“. The global supercell, denoted by fl®, contains 
N 1 XN 2 X N 3 unit cells equipped with periodic boundary conditions. Due to the 
translational symmetry, the problem on a global supercell can be equivalently 
decomposed into Ni x N 2 x N 3 independent problems on a unit cell, each repre¬ 
sented by a k-point in the Brillouin zone using a Monkhorst-Pack grid [21] . One 
important component of the SCDM-k method is the so-called local supercell 
associated with the unit cell D“. A local supercell is comprised of x iV| x 7V| 
adjacent unit cells < Ni,i = 1,2,3). Figure illustrates the described 
relationship between the unit cell, local supercell, and global supercell in a two 
dimensional setting. 

Computationally, each k-point problem can be solved with any suitable dis¬ 
crete basis set. Below we assume a planewave basis set is used with Mi x M 2 x M 3 
grid points in reciprocal space. This corresponds to a set of grid points of the 
same size in real space discretizing 12“ uniformly. With a slight abuse of no¬ 
tation, this set of discrete grid points in the unit cell is also denoted by f 2 “. 
A similar abuse of notation is used for the global supercell D® and the local 
supercell 

To present the algorithms generally, we allow for distinct numbers of points in 
each of the three dimensions. This is the case for both the real space grid of the 
unite cell, and the k-point grid. However, often the asymptotic computational 
cost will only be a function of the product of the number of points per dimension. 
Therefore, we use capital letters with subscripts such as Ni,N 2 , to define the 
number of points per dimension and the same capital letter sans subscript such 
as N to denote the total number of points. In addition, we use calligraphic 
letters such as K, to denote sets. These will be used for operations such as 
general indexing of matrices or summation. 
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Notation 

Description 


Unit cell; Collection of indices corresponding to uni¬ 
form real space grid points in the unit cell 


Local supercell; Collection of indices corresponding 
to uniform real space grid points in a local supercell 

na 

Global supercell; Collection of indices corresponding 
to uniform real space grid points in a global supercell 

1 

Imaginary unit V—1 

Gl, 62 ,03 

Unit vector along each dimension 

S = (Si, S2, S3) 

Shift vector of the Monkhorst-Pack grid 

k = {ki,k2,k3) 

A k-point 

Ml , M2, M3 

Number of uniform grid points in the unit cell along 
each dimension 

Ni,N2,N3 

Number of k-points for Brillouin zone sampling along 
each dimension 


Number of unit cells in a local supercell along each 
dimension 

Li,L2, L 3 

Length of the unit cell 0" along each dimension 

M 

Affi X M 2 ^ ^3 

N 

Ni X N2 X N3 

~1^ 

Nf xN^xN^ 

1C 

Collection of all the k-points of the Monkhorst-Pack 
grid corresponding to the global supercell 


Collection of all the k-points of the Monkhorst-Pack 
grid corresponding to a local supercell 

rib 

Number of wavefunctions in the unit cell 

V’6,k 

A Kohn-Sham orbital on the global supercell U®, 
both at the continuous and the discrete level 

Vb,^ 

A Kohn-Sham orbital on the local supercell both 

at the continuous and the discrete level 

UbM 

Periodic part of a Kohn-Sham orbital on the unit cell 
U“, both at the continuous and the discrete level 

Pk 

Density matrix corresponding a particular k point 
on the global supercell 0® at the discrete level 

P 

Total density matrix on the global supercell 17® at 
the discrete level 

c 

Collection of indices for the selected columns on the 
unit cell 0" 

C3 

Collection of all indices for the selected columns on 
the global supercell 17® 


Table 1: Notation used in the paper 
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Figure 1: Illustration of the relationship in two dimension {d = 2) between 
the unit cell (shaded red), local supercell (shaded blue), and global supercell 
corresponding to iVi = A ^2 = 6 and Nf = = 2. 


2.2. Sub-selection of matrices 

Because we deal with very large matrices that exhibit structure due to the 
underlying problem set up, it is very useful for us to associate quantities from the 
physical problem with portions of matrices. Therefore, we use set subscripts to 
denote sub-selection of rows and columns of matrices. For example, ^{ 1 , 2 },{ 3 , 4 } 
is a submatrix of A consisting of the intersection of rows one and two with 
columns three and four. We use as a subscript to denote that all rows or 
columns are considered, i.e. A,.! denotes the first column of A. 

2.3. Column-pivoted QR factorizations 

Because they play a central role in the development of our algorithms, we 
briefly introduce column-pivoted factorizations. Consider A G where 

the sizes have been chosen to coincide with the matrices we will actually be per¬ 
forming these factorizations on later. A QR factorization with column pivoting 
(QRCP) of A follows the algorithm in [? ] to compute an M x M permutation 
matrix 11, a x Ub orthogonal matrix Q, a nt x Ub upper triangular matrix R 
and a Ub x {M — Ub) matrix T such that 

An = Q [i? T]. 

The column pivoting algorithm in [? ] is a greedy procedure to try and 
ensure that R is as well conditioned as possible and that its singular values do 
not differ too much from those of A. If we let C denote the original indices of the 
Ub columns permuted to the front by H then we observe that A, c = QR- Hence, 
if R is well conditioned we expect these columns to form a well conditioned 
basis for the range of A. Finally, What we have presented here is a very narrow 
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definition of such factorizations, and we direct the reader to [? ] and [? ] for a 
more through treatment of such algorithms. 

3. Kohn-Sham density functional theory with Brillouin zone sampling 

In this section we provide a relatively self-contained description of Kohn- 
Sham DFT, and focus particularly on Brillouin zone sampling due to the peri¬ 
odic structure. A more detailed discussion can be found in, e.g.^ [^ . 

3.1. Continuous formulation 

For a crystalline solid modeled by a global supercell consisting of N unit 
cells with each unit cell containing 2 nb electrons (the factor of two comes from 
spin), the Kohn-Sham equations are [5] 

Hipa{r) = + V{r)ipa{r) = Eaipair), r e n®, a = l,...,nbN. 

( 1 ) 

Each eigenfunction tpa satisfies the Born-von Karman (BvK) boundary condi¬ 
tion, which is the periodic boundary condition on 

tpa{r + = fjair), Vr G 0®, i = 1,2,3. (2) 

Using the BvK boundary condition, all eigenvalues {£„} are real, and all eigen¬ 
functions {tfa} are orthogonal to each other under the inner product. We 
assume the eigenvalues are ordered in a non-descending manner. 

Kohn-Sham DFT requires solving for the UbN eigenfunctions associated with 
the lowest eigenvalues {£a}'^=i- £a is called a Kohn-Sham orbital energy, and 
tpa is called a Kohn-Sham orbital or a Kohn-Sham wavefunction. In Kohn-Sham 
DFT, V (r) is the self-consistent single particle potential, and self-consistency is 
usually reached through an iterative procedure. Here without loss of generality 
we assume self-consistency has been reached. We also assume a pseudopotential 
is used so U(r) is smooth and can be discretized using uniform grid points. 
Nonlocal pseudopotentials are neglected for simplicity of notation, and do not 
introduce any extra numerical difficulty when added. We refer readers to [35] 
for more detailed explanation of the terminology. 

For crystalline solids, U(r) is a periodic function in 12“, i.e. 

V(r + L,e,) = V(r), Vr G i = 1,2,3. (3) 

The Bloch theory (or Bloch-Floquet theory) states that the UbJV Kohn-Sham 
wavefunctions can be relabeled using two indices a = (b,k), so that ipa = V'fe.k 
can be decomposed into the form 

V'b.k(r) = e**"''’Mf,,k(i’), (4) 

where 'Uf,,k(r) is the periodic part of '0b.k(r) satisfying 

ut,k(r + UiGi) = ut^k(r), Vr G U®, z= 1,2,3. (5) 
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Using the Bloch decomposition, Eq. Q can be written in terms of u on each 
unit cell as 

- i(V+ik)2ub_k(r) + U(r)ub^k(r) = eb_kMb_k(i'), r S U“, (6) 

Where each k is a point in the first Brillouin zone defined as 


TT 

IT 


TT 

TT 


TT 

TT 

7^1 


X 

1 

1 

_1 

1 < 7 ^ 

X 

1 

1 

_1 

- 1 

1 


For each k-point, b = 1,... ,n(,, i.e. rib is the number of wavefunctions per k- 
point. For simplicity we will drop the b subscript when describing properties 
that hold for each Ub,k(r), b = There are a few k-points in the 

Brillouin zone that play special roles in crystallography and also in numerical 
computation. The most important one is the T point, which is the origin of the 
Brillouin zone. 

In order to solve Eq. Q, the Brillouin zone B needs to be discretized. One 
of the most widely used discretization schemes is the so-called Monkhorst-Pack 
grid [2l], which uses a uniform discretization of B. The discretized set of k- 
points is 


1C = 


f 27rji 27rj2 27rj3 

N 2 L 2 N^L'i 


■ - ^ 1 • - 1 o a 

Ji — 2 ^ 2 ^ ^ ^ 


(7) 


where s is a shift vector, and we assume Ni is an even number. Two common 
choices are s = (0,0, 0) (no shift) and s = ((half grid shift). 
It should be noted that the inclusion of a non-zero shift vector could violate 
the BvK boundary condition. But this only adds an optional post-processing 


procedure for handling the phase vector and will be discussed in section 4.5 For 
now on we assume s = (0, 0, 0), and the BvK boundary condition holds because 


l/'6,k(r -k NiLie,) = + NiLiSi) = = l/'6,k(i’)- 


The last equality holds because 


„ik-NiLiei 


= 1 , 


which is satisfied for k G Ail. 

3.2. Discrete formulation 

For each k G 1C, Eq. is solved numerically for b — l,...,nb, and we 
assume the resulting eigenfunctions are solved for and represented on a uniform 
grid discretizing the unit cell 


r 

j2L2 

j^L-i \ 


M 2 

M 3 ) 


7^ = 


j, = 0,...,M,-1, z = I,2,3|. (8) 




















Then each eigenfunction U{,,k(r) is represented as a column vector. With some 
abuse of notation, this column vector is still denoted by Ub^k € and 

M6,k(j) = u&^k(rj),rj e 'R- 

Since the effectiveness of the technique presented in this paper is heavily 
based on numerical linear algebra procedures such as QRCP factorizations and 
discrete Fourier transforms, it turns out that using discrete variable indices such 
as j is more convenient than the continuous variable indices such as r. Therefore 
we will use discrete indices when possible for the rest of the paper. Furthermore, 
we let 

= {(ii, J2, J3)|ji = 0,..., M, - 1, i = 1,2,3} 

denote the set of indices corresponding to real space grid points TZ in the unit cell. 
The periodic boundary condition on 17“ allows us to interpret j and j + as 
equivalent points {i = 1,2,3). The periodic eigenvector Ub^k satisfies the discrete 
orthonormal condition 


= Sb,b'5kM'- (9) 

jeO" 


Again, we denote by 


ns = = i = l,2,3} (10) 

the corresponding set of indices of real space grid points in the global supercell. 
Similar to before, the periodic boundary condition on 17® allows us to interpret 
j and j + NiMiBi as equivalent points {i = 1,2,3). The discretized eigenfunc¬ 
tion '!/'&.k(j) is periodic on the global supercell 17®, and satisfies the discrete 
orthonormal condition 


X! V’b,k(j)'*/'b',k'(j) = Sb,b’Sk.,k’N- (11) 

jens 


Note that the convention taken in Eq. (101 places the origin of the unit cell 
17“ at the origin of the global supercell 17® as well. This is allowed due to the 
periodicity of the global supercell. 

We now introduce a key concept for the development of our algorithm, the 
discrete density matrix. Notationally, it is denoted by Pk G ([^iMN)x(MN) 
for each k-point is defined as 


, rib 

^'k(j,j') = Xl^bk(jXk(j'): jJ'Gl7®. (12) 

Here * stands for the Hermitian conjugate operation. The complete discrete 
density matrix is then defined as 

^(j,j') = E ^k(j,j'), j,j'Gl7®. (13) 

kG/C 


9 



It is easy to verify that Pk and P satisfies the normalization conditions 


Tr Pk = ni, and Tr P = Nub ■ 


The following block circulant property of the density matrix plays an impor¬ 
tant role in the SCDM-k method for constructing localized orbitals. 

Proposition 1. P satisfies the block circulant property, i.e. 



Proof. It is sufficient to show that each Pk satisfies the block circulant property. 
For any i = 1,2, 3, 




Therefore, each Pk is block circulant. Here we have implicitly used the afore¬ 
mentioned periodic structure over 11® to address when j -|- MiSi or j' -f MiBi 


yields a point outside the boundary of fl®. 


□ 


4. Selected columns of the density matrix 

Once the Kohn-Sham equations have been solved numerically, we have the 
means to construct a set of Nub eigenfunctions encoded as columns of 4'® over 
the global supercell H®. However, the functions will be delocalized spatially. 
We now outline a construction for computing Nub localized eigenfunctions over 
the global supercell that span the same space as 'k®. Notably, the matrix of 
all Nub vectors over NM spatial points may be prohibitively expensive to even 
store. As a consequence of this, we provide algorithms that construct Ub of these 
localized functions associated with a single unit cell. The periodic structure of 
the problem means this is sufficient for our needs. 

4.1. Review of the SCDM procedure for V point calculation 

In order to present the SCDM-k method, we first briefly review the proce¬ 
dure for localizing Kohn-Sham orbitals via the SCDM procedure for T point 
calculations, of which the details can be found in Ref. m- In order to remain 
notationally consistent within this work, we use slightly different notation than 
in [IB] . 

We present the SCDM method as if Kohn-Sham orbitals are only defined on 
a single unit cell 17“ with the one k point, the so-called the F point, i.e. N = 1. 
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Let {V'alaLi represent the rib Kohn-Sham orbitals discretized on a uniform grid, 
and collected as columns of the matrix dr. We leverage the fact that the density 
matrix P = d'd'* has well localized columns for insulating systems [19], and 
use columns of P as a starting point for constructing a localized basis. We are 
always interested in finding a representation of the entire Kohn-Sham invariant 
subspace and thus construct rib localized orbitals. 

Algorithm [^presents the SCDM algorithm in its simplest form, providing a 
unitary transform from dr to a set of orthogonal localized orbitals d>. Here we 
see that the algorithm computationally amounts to a single QRCP factorization. 
This factorization can be computed, e.g., via the qr function in MATLAB [? ] 
or the DGEQP3 routine in LAPACK [? ]. 

To motivate the algorithmic developments here, we also present a slight 
variation on Algorithm]^ Specifically, we assume that we do not have access to 
the matrix Q. Instead we simply have a set of rib columns that the permutation 
matrix H chose to move forward during the QRCP process. It turns out that 
this information is sufficient to generate a localized basis. This variation is 
presented in Algorithm where we first compute the set C and then construct 
the relevant columns of d^d^*. These columns are themselves well localized but 
they are not orthogonal, which is a desirable property. Fortunately, because 

d'd' = P,c {Pc.cr^P*fi 

we may orthogonalize P^c using the square root of {Pc,c) The rapid de¬ 
cay away from the diagonal of Pc,c ensures that the resulting orthogonalized 
columns remain well localized. 


Algorithm 1 Simple algorithm for SCDM with T-point calculation. 

Input: The orthonormal Kohn-Sham orbitals d'. 

Output: The orthogonalized SCDM d>. 

1: Compute the QRCP factorization tf*!! = QR 
2: return $ = d>Q 


4-2. SCDM with Brillouin zone sampling 

When we had a single k-point we sought to compute a set of Ub localized 
functions in the cell associated with that k-point. Now, we may view our spatial 
domain as consisting of N unit cells, consequently we seek to compute Nub 
localized functions over the entire spatial domain Q®. The most straightforward 
way to accomplish this would be to simply treat a larger problem, one with 
NM spatial unknowns and Nub eigenfunctions denoted by df®, as input to 
Algorithmic or In this case, the global density matrix d*® (df®)* would exhibit 
the locality we desire. However, this could be prohibitively expensive since 
the QRCP computation would scale as 0{N^). However, conceptually such a 
procedure can serve as a point of comparison for the performance of our new 
algorithm. 
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Algorithm 2 Algorithm for SCDM with F-point calculation. 

Input: The Kohn-Sham orbitals 'I'. 

Output: The SCDM i* or the orthogonalized SCDM $. 

1: Compute the QRCP factorization tf*!! = QR 

2: Let C denote the original indices of the first n;, columns selected by D. 
3: Compute the SCDM $ = P-fi = which are localized orbitals. 

4: if the orthogonalized SCDM are desired then 
5: Compute ■ 

6: Compute the orthonormal orbitals as = 4> {Pc,c)~^^^■ 

7: return >1> 

8: else 

9: return 'h 

10: end if 


To overcome this obstacle, we appeal to the block circulant property in 
Proposition Applying our algorithm to 'F® directly we would expect that 
the index set C, of size Auh, would contain nt columns associated with each 
unit cell $7“. Therefore, we will solve a smaller problem to extract rib columns 
corresponding to a single unit cell. We then construct a set denoted C® that 
may be used for the entire global supercell by simply adding the translates 
of these columns into the other unit cells, with N unique translates of the rif, 
columns this yields a set of the desired size. Once we have this set of columns 
of the global density matrix, we leverage the block circulant structure of P to 
efficiently perform the orthogonalization in a manner analogous to lines 6 and 
7 of Algorithm 

The set C® constructed by the above procedure does not necessarily coincide 
with the columns of P we would select if we used our existing algorithm directly 
on the global problem. However, we make the assumption that while the detailed 
shape of the columns of the density matrix requires a relatively large number 
of k-points to resolve, the actually selection of columns in a given unit cell is 
relatively insensitive to the number of k-points used. Since Ub k is discretized 
on a fine real space grid, even if the columns selected shift by a few grid points 
the resulting columns of the global density matrix should still be well localized. 
Numerical experiments in section corroborate this intuitive argument. 

We now explicitly introduce the small local supercell where 17“ C C 17® 
(Figure]^ used in to pick the rib columns of C in a single unit cell. With a similar 
abuse of notation as before, is also used to denote the indices of grid points 
in the local supercell. The number of unit cells in 17^ along the Tth direction is 
denoted by Nf. Following the same convention as in the definition of the global 
supercell in Eq. ( |Io| ), the grid points in 17^ are 

^^^ = {(ji,i2,j3)|ji=0,...,A/M,-l, i = l,2,3}, (14) 

which places the origin of the unit cell 17^ also at the origin of the global supercell 
17®. 
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4.-3. Computing columns of the density matrix 

We now discuss the construction of selected columns of the density matrix 
that are localized. Let qi = Ni/N[. If qi is an integer for i = 1,2,3 then the 
Kohn-Sham equations on the local supercell can be simply solved through a 
coarse sampling of the Brillouin zone. The resulting collection of grid points in 
the Brillouin zone is denoted by 


IC^ 


( 27rji 27rj2 STrjg \ . ^ ^ 

\NlLi N^L2 Nl^L^J 2 ’■■■’ 2 



(15) 


Since the local supercell is only used as a numerical tool to select columns more 
efficiently, we make an additional approximation by discarding the shift vector 
s when reconstructing the Kohn-Sham orbital from its periodic part Uh,k, 
and all satisfying the BvK boundary condition in Uf. 

After solving the Kohn-Sham equations on the local supercell via coarse 
sampling of the Brillouin zone, we get orthonormal wavefunctions and 

arrange them into the columns of the N^M x N^Ub matrix . We now apply 
the SCDM procedure for T-point calculations as outlined in Algorithm to 
which selects UbN^ columns denoted C^. We then restrict this larger set of 
selected columns to the set C“ which contains the Uf, columns associated with 
points inside a single unit cell 

Given C“, we may compute the respective selected columns of the density 
matrix on the entire global supercell as P(j, c), j S fl®, c S C“. We first construct 
-Pk(j,c),j G fl“,c e C“ as 


, 

^ 6=1 


Then for j € note that for any = 0,..., — 1, z = 1, 2, 3, we have 


Pk(j + n,M,e„c) J ^ g 


=*k-(rj-r„) 


w&.k(j + n^M^ei)ul^^(c) 


b=l 


= e*k'("*^-e.)p^(j^c). 


(17) 


Therefore, Pk can be constructed just by multiplying Pk(j,c), j G 11“ by phase 
factors. Summing up Pk(j,c)’s for all k we obtain P(j,c). For convenience we 
let Pc G denote the matrix elements 


-P:.C“ (j,&) = ^’(j,C6), j G G C“. 

The preceding discussion is summarized in Algorithm and yields the desired 
selected columns of the density matrix over the global supercell. Note that this 
corresponds to computing only rib of the Nnb total localized functions we expect. 
If desired, the others may be constructed by translating C“ into a different unit 
cell, see the following section for details. 
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Algorithm 3 Computing the (non-orthogonal) selected columns of the density 
matrix inside a unit cell. 

Input: Monkhorst-Pack points in the Brillouin zone Ai; 

Periodic parts of Kohn-Sham orbitals {u„k} for n = and 

ke/C; 

Sub-sampling Monkhorst-Pack points in the reciprocal space fC^; 


Output: Non-orthogonal SCDM associated with the unit cell 


1 

2 

3 

4 

5 

6 

7 

8 


Construct from {M„k} with k £ from the Bloch decomposition 
Compute the QRCP factorization 11 = QR 

Let denote the original indices of the first N^rii, columns selected by 11. 
Find the selected column indices C“ = {j £ C^\j £ fl“} in the unit cell. 

for all k do 

Construct Pk(j,c) for j £ and c £ C“ via ( |16[ ). 
end for 

Compute P(j, c) for j £ fl® and c £ C“ using (171. 


4.4- Construct the orthonormalized SCDM 

We now have a procedure to compute C“ and due to the block circulant 
property of the density matrix, this is sufficient. All the remaining columns are 
the block translates of these columns into the other unit cells in . Define the 
collection of indices 


C® = {c-I-(niMi,n2M2,n3M3)|c £ C, = 0,..., - 1, i = l,2, 3}, 

which induces a matrix P-,ca € ^(MN)x(nbN) 

Pxcn ilb) = P{j,Cb), jGD®,CbGC®. 

These columns of P are precisely the Arif, columns that form our localized basis 
over the entire global supercell. However, they are not orthonormal and we now 
describe an efhcient procedure to orthonormalize them. 

The matrix P-.fig is block circulant when viewed as an A x A block matrix 
with each block of size M x Ub- Note that the storage cost of Peg is 0{nbMN'^), 
and it is therefore never explicitly computed or stored. We also define the matrix 
Peg,eg G as 

PC!!,C!! {a,b) = P {Ca,Cb) , Ca,Ct,GC® 

and note that Tfcs.cs is also block circulant when viewed as an A x A block 
matrix with each block of size Ub x Ub- 

Now, if we consider the matrix <1)® G defined as 

= P-.,c^{Pco,c^rK (18) 

it is also block circulant and satisfies the discrete orthonormality condition 

($ 9 )*$® = /. 
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Therefore, represents an orthonormal set of localized basis functions across 
all of the k-points. Due to the translational invariance, we only need to compute 
the columns of d)® centered in D“, and as before, the remaining columns are just 
the block translates of these columns into the other unit cells in D®. Similar 
to P-.fig, the entire matrix d>® is neither explicitly constructed nor stored in 
practice. 

In fact, all of the rows of P-.^g and d*® may be constructed from knowledge of 
the first M rows, i.e. those associated with a single unit cell. More specifically, 
if j S and j' € D® are such that 

j' = j + (niMi,n2M2,n3M3) 
for some ( 711 , 712 , 713 ), then for any Cf, S C® 

= d>® (j,Cf, - {niMi,n2M2,ni,Mz)) ■ (19) 

To make this explicit notationally, let Pn",c» G ^{M)x(nbN) such that 
-Fh“.cs (j,&) =-P(j,Cf,), jeD“,CbGC®. 


We define d?" as 


d>“ = PQgfig (Pcgfig) 


and since we may construct d>® from d>“ via Eq. (191 we now focus on the 
construction of d>“. 

Direct computation of the matrix square root of Peg,eg G ([^('n-bN)x{nbN) 
Eq. (18) costs 0{N^) and is hence computationally expensive. Instead, we 


demonstrate an algorithm to take advantage of the block circulant property 
of Peg,eg that reduces the complexity to C>(A^logiV). Let F be the matrix 
representing the three-dimensional discrete Fourier transform that acts with 
respect to the n = (rii, 712 , 713 ) index in the set C® and F~^ = F* be the three- 
dimensional inverse discrete Fourier transform that acts with respect to the 
Fourier space grid corresponding to n. Computationally these operations are 
performed via the fast Fourier transform. Observe that. 


=Peig,egF~^F(.Peg,eg)~^^ F~^F 
= {Png,egF-^) {FPeg,egF-^)~^ F. 


( 20 ) 


We can move the square root outside F and F ^ because F [Peg,eg) ^ F ^ and 
_ 1 

{FPeg,egF~^^ ^ have exactly the same eigenvalues and eigenvectors. This is 
a consequence of the fact that F is unitary and Peg,eg is Hermitian positive 
definite. 


Eq. (20 1 compactly represents the procedure for accelerating the computa¬ 


tion of the orthogonalized SCDM and we now elaborate on their construction. 
First, we observe that the matrix 


Peg,eg — FPeg ,eg F 


-1 


( 21 ) 
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is block diagonal with TV blocks each of size rift X rife. This means that, [Pcafig 1 
may be computed by taking the inverse square root of each small block inde¬ 
pendently. The block diagonal structure follows immediately from the fact that 
Pea fig is block circulant and the use of the discrete Fourier transform. Fur¬ 
thermore, the diagonal blocks may be computed by applying F to the first rib 
columns of Pcafia , which means those are the only columns we need to explicitly 
construct. Finally, since there are N diagonal blocks in total, we may associate 
each of the diagonal blocks with an index n. Therefore, we refer to a given 
diagonal block of Pcafia as Pcafia-n- 
We now let 

Pn-fig =Pn-figF-\ (22) 

which may be constructed by applying F to the columns of (Pn^fia)*■ Finally, 
similar to before we let Pn'^fia-n denote a M x rib matrix containing rib of the 
columns of pQ^fig associated with a given index n. We define 

(23) 

and once again let denote a M x rib matrix containing rib columns of 
associated with a given index n. 

The use of a single index allows us to compactly write the construction of 
l>“ as 

K = Pa^c-n(Pc^c-n)”'^^ (24) 

We may then form the first M rows associated with a single unit cell of 4)® 
by applying F~^ to the columns of 4“. Algorithm summarizes the steps for 
constructing the orthogonalized SCDM centered in a single unit cell. 


Algorithm 4 Computing the orthonormalized SCDM from the non-orthogonal 
SCDM._ 

Input: Monkhorst-Pack points in the Brillouin zone {k}; 

Non-orthogonal selected columns of the density matrix Pc', 

Output: Orthogonal SCDM <l>® associated with the unit cell D“. 


1: Compute the first rib columns of Pcafia, and Peta^fia. 
2 : 


Compute Pcg,ca via Eq. (211. 

^ ' 

Compute {^Pcg,ca\i^ 


for all n. 


Compute Pn^fig = (F(Pn“,cs) ) . 

/ - \-i/2 

Compute 'I>“ = Pc“;n (^Pcs,C3;n 1 
Compute <E>“ = (p-i ($“)*)*. 


for all n. 


Construct <I>®(:,C) from <E>“ via Eq. (19l. 
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^.5. Post-processing for the shift vector in the Monkhorst-Pack grid 

Here we address the issues that arise when using the Monkhorst-Pack grid 
with half grid shift s. In such case, each orbital ipa does not satisfy the BvK 
boundary condition in the global supercell Instead, 

N,Liei) =-fjair), Vr e H®, i = 1,2,3, (25) 


and a phase factor (—1) is gained. To be accurate, this phase factor needs to 
be taken into account in the SCDM. Note that 

P(j, c) = ^ c) = c). (26) 

k 


Here Pk and P are the density matrices obtained by taking s = (0,0, 0). There¬ 
fore the post-processing only requires multiplying each column of the SCDM 
P(j,c) and the orthonormalized SCDM ^ by a phase vector It is 

straightforward to verify that the post-processing procedure (26) maintains the 
orthonormality of the orthonormalized SCDM. 


4-6. Complexity 

The computational complexity for selecting the SCDM using the QRCP fac¬ 
torization with a local supercell is Mn\). Then, the complexity for com¬ 

puting the non-orthonormal SCDM Pns.C" via matrix-matrix multiplication is 
0{MNnl). Due to the usage of Eq. the cost for computing Pc(j,c),j G 
is only 0{MNnb). Importantly, M and nt are assumed to be fixed with respect 
to the increase of the number of k-points (i.e. the number of unit cells contained 
in the global supercell). Furthermore, we explicitly set to be small and not 
grow with N. Therefore, the cost for obtaining the non-orthogonal SCDM is 
0{N). 

In order to generate the orthonormal SCDM, the cost for computing the 
Fourier transform FPcsfisF~^ is 0{N\og{N)n\)^ and the cost for computing 
the matrix square root of the block diagonal matrix FPcgfigF~^ is 0 {Nnl). 
The cost for computing the first block row of (PcsP~^) is 0{N\og{N)Mn}f) 
and the cost for multiplying with matrix square root is 0{NMnl). Therefore, 
the total cost for generating <i)®(j,c),j G G is 

0{Nlog{N)nl + Nnl + N\og{N)Mn^, + NMnl)). 

Finally, the cost for the post-processing by multiplying a phase vector is 0{MNnb). 
If we take the leading term with respect to iV, M and think of nt, as a small 
constant, then the complexity of the whole algorithm is 

O (iV log(A^)Mn6 -k NMnl + ■ 

Hence, the complexity for obtaining the orthonormalized SCDM with respect to 
the number of k-points used for sampling the Brillouin zone is only 0{N log N). 
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5. Numerical examples 

To illustrate the performance of the SCDM-k algorithm, we consider the 
localization of orbitals obtained from model potentials in two and three dimen¬ 
sions. In three dimensions, the model potential takes the form 

Ni-l N 2 -IN 3 -I / 3 \ 

^ K U-^n,e, j . (27) 

m—O 712—0 n^—O \ 2=1 / 

Here Vu{r) is taken to be a Gaussian potential centered at the origin in the unit 
cell modeling an atom, i.e. 

14 (r) =-4.0e”^. (28) 

The shape of the model potential in two dimensions is similar, 

N 1 - 1 N 2-1 / 2 \ 

r-^n,ei|. (29) 

m—O 212=0 \ 2—1 / 

We generally set cr = 1 and will explicitly state when we use a different value. 
5.1. Shapes of the SCDM 

We first present numerical examples illustrating the shapes of the SCDM in 
two and three dimensions. In the two dimensional case, we set Li = L 2 = 6.0, 
Ml = M 2 = 32, rib = 3, and use Ni = N 2 = 8 k-points per direction. For this 
example we set cr = 0.8. Setting rib = 8 nieans that we should have one orbital 
that behaves similar to an s-orbital (spherical) and two orbitals that behave 
similar to a p-orbital (non-spherical). Figure shows the shape of SCDM and 
the orthonormalized SCDM. We only plot the three SCDM for a single k-point 
near the middle of the domain and observe the rapid decay of both the SCDM 
and the orthogonalized SCDM away from their unit cell. 

In three dimensions, we use Ni = N 2 = ^ k-points per direction, set 

Ml = M 2 = M 3 = 20, and let rib = 4. Similar to the 2D case, we expect 
that there should be one orbital that behaves similar to an s-orbital and three 
orbitals that behave similar to a p-orbital. Figure shows the SCDM and the 
orthogonalized SCDM. Here we only plot isosurfaces for the four SCDM for a 
single k-point near the middle of the domain. As in the two dimensional case, 
we see that the bulk of the orbital is well localized within a single unit cell. It is 
also even more apparent than in the 2 d case that our localized functions match 
the expected s- and p-orbital structure. 

In both of the preceding cases, the well localized orbitals also imply that the 
matrix Pcafis and hence the matrix FPcq are well conditioned, and the 
computation of the matrix square root does not cause any numerical problems. 
In fact, the condition number of the small block matrices we have to take the 
inverse square root of was less than five in the two dimensional example and 15 
for the three dimensional example. 
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5.2. Locality of the SCDM 

Figure 1^ and show that the SCDM are qualitatively very well localized. 
We now quantify this by measuring the locality of the functions systematically. 
Specifically, we construct the SCDM and the orthogonalized SCDM and then, 
over the global supercell fl®, measure the fraction of entries (denoted by nz%) 
where the relative magnitude of the functions is above a given threshold e. 

First, we measure the locality for the two dimensional problem. We use 
Ni = N 2 = 16 k-points in each direction, rib = 3, and Mi = M 2 = 40. 
Figure]^ shows the average locality for both the SCDM and the orthogonalized 
SCDM. The diameter of the localized region is proportional to the square root 
of the volume, and thus is proportional to consequently this is the 

quantity we choose to plot. Importantly, here we see that the orthogonalization 
does not severely impact the localization properties of the SCDM. Furthermore, 
when the relative truncation threshold is set to e = 10 “^ for the orthogonalized 
SCDM less than 1% of the entries remain non-zero. While in some cases here 
the orthogonalized orbitals are actually more localized, we do not necessarily 
expect such behavior in general and only expect that the orthogonalization step 
will not significantly reduce the locality. 

We now move back to the three dimensional problem and use Ni = N 2 = 
N 3 = 8 k-points in each direction, rib = 4, and Mi = M 2 = M 3 = 20. In Figure 
[^we plot average locality for both the SCDM and the orthogonalized SCDM. 
Analogously to before, the diameter of the localized region is proportional to 
the cube root of the volume, and thus we choose to plot Once again, 

the orthogonalization does not severely impact the localization properties of the 
SCDM and when the relative truncation threshold is set to 10“^ only about 
0.7% of the entries or the orthogonalized SCDM remain non-zero. 

5.3. Changing the local super cell size 

Rather than finding the selected columns using wavefunctions defined on the 
entire global supercell 11 ®, we find a good approximation to these columns by 
using a much smaller local supercell whose size does not increase as the size 
of the global supercell grows. In fact, this is the key approximation made by 
our algorithm, and the only source of “error” when compared to our existing 
methods. Here we quantitatively study the dependence of the quality of the 
columns selected by this local supercell approach. 

Concretely, we use a two dimensional problem with A^i = N 2 = 16 k-points 
in each direction, Ub = 3, and Mi — M 2 = 20. We proceeded to vary the size 
of the local supercell, Nf and A^|, and compute the SCDM and orthogonalized 
SCDM. We measure the locality as the fraction of non-zero entries after trun¬ 
cation at a relative magnitude of 10“^. Figureshows that the localization of 
both the SCDM and the orthogonalized SCDM is nearly constant as the local 
supercell size varies. 

To compare against our existing methods and validate our approximation, 
we let the local supercell size grow to match that of the global supercell. This 
corresponds to running Algorithmj^on the global problem and in Figurej^occurs 
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(a) 




(b) 




(c) 


Figure 2: Absolute value of the three SCDM (a) and orthogonalized SCDM 
(b) located in a single unit cell and plotted over the global supercell. Zoomed- 
in images of the significant regions of orthogonalized SCDM (c) showing the 
expected spherical (s-orbital like) and non-spherical (p-orbitals like) structure. 


at the rightmost point of the plot since Ni = N 2 = 16 and N( = A^| = 16. 
Therefore, we observe that there is no noticable error introduced by our new 
algorithm: we get functions that are just as localized as if we treated the global 
problem directly. 
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(b) 


Figure 3: Isosurface at a relative value of 0.1 of the absolute value of the SCDM 
(a) and orthogonalized SCDM (b) located in a single unit cell and plotted over 
the global supercell. 



Figure 4: Average fraction of nonzero entries after truncation for the SCDM 
and the orthogonalized SCDM in two dimensions. 


5.4- Scaling with the number of\i-points 

Finally, we demonstrate the computational scaling outlined in section 
Here we consider a three dimensional problem and increase the total number of 
k-points in each direction. In this experiment we used Mi = M 2 = M 3 = 10 and 
rib = 4. Figure shows the time taken to compute the orthogonalized SCDM 
as the number of k-points grows. In this case, the terms linear in N actually 
dominate the computation and we observe close to linear scaling. 
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Figure 5: Average fraction of nonzero entries after truncation for the SCDM 
and the orthogonalized SCDM in three dimensions. 
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Figure 6: locality as local supercell size grows 


6. Conclusion and future work 

We developed the SCDM-k method, which is a new method for finding both 
orthogonal and non-orthogonal localized orbitals from a set of Kohn-Sham or¬ 
bitals obtained from Brillouin zone sampling. The SCDM-k method is implicitly 
based on the use of the gauge invariant density matrix, and obtains localized 
orbitals without an iterative optimization procedure. Furthermore, the compu¬ 
tation exhibits 0{N log N) scaling with respect to the total number of k-points 
used. Numerical results for two and three dimensional systems with model po¬ 
tentials indicate that the SCDM-k method generates localized orbitals that can 
be visually similar to MLWFs. All routines used in the SCDM-k method are 
standard linear algebra routines and are thus easily parallelizable. This could 
enable efficient computation of localized orbitals for solids both in the post¬ 
processing step and performed on the fly. Though described using a uniform 
real space grid for example, the SCDM-k method does not rely on a particular 


22 






K-points 


Figure 7: Time taken to compute the orthogonalized SCDM as the number of 
k-points grows. The upper dotted line represents O(iVlogA^) scaling and the 
lower dotted line represents 0{N) scaling 


basis set and can be combined with any electronic structure software packages 
supporting k-point sampling in the Brillouin zone. We plan to apply the SCDM- 
k method to compute localized orbitals and compare directly with MLWFs for 
Kohn-Sham DFT calculations of real materials systems in the near future. 
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